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With modern technology development, functional data are be- 
ing observed frequently in many scientific fields. A popular method 
for analyzing such functional data is "smoothing first, then estima- 
tion." That is, statistical inference such as estimation and hypothesis 
testing about functional data is conducted based on the substitu- 
tion of the underlying individual functions by their reconstructions 
obtained by one smoothing technique or another. However, little is 
known about this substitution effect on functional data analysis. In 
this paper this problem is investigated when the local polynomial 
kernel (LPK) smoothing technique is used for individual function 
reconstructions. We find that under some mild conditions, the sub- 
stitution effect can be ignored asymptotically. Based on this, we con- 
struct LPK reconstruction-based estimators for the mean, covariance 
and noise variance functions of a functional data set and derive their 
asymptotics. We also propose a GCV rule for selecting good band- 
widths for the LPK reconstructions. When the mean function also 
depends on some time-independent covariates, we consider a func- 
tional linear model where the mean function is linearly related to the 
covariates but the covariate effects are functions of time. The LPK 
reconstruction-based estimators for the covariate effects and the co- 
variance function are also constructed and their asymptotics are de- 
rived. Moreover, we propose a L 2 -norm-based global test statistic 
for a general hypothesis testing problem about the covariate effects 
and derive its asymptotic random expression. The effect of the band- 
widths selected by the proposed GCV rule on the accuracy of the LPK 
reconstructions and the mean function estimator is investigated via 
a simulation study. The proposed methodologies are illustrated via 
an application to a real functional data set collected in climatology. 
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1. Introduction. Functional data consist of functions which are often 
smooth but usually corrupted with noise. With modern technology devel- 
opment, such functional data are being observed frequently in many scien- 
tific fields; see Besse and Ramsay [3], Ramsay [20] and Ramsay and Dalzell 
[21], among others, for good examples and analyses. Comprehensive surveys 
about functional data analysis (FDA) can be found in [23, 24]. 

Mathematically, the above-mentioned functional data may be modeled as 
independent realizations of an underlying stochastic process, 

(1.1) y i {t) = r ] (t) + v i (t) + e i {t), i = l,2,...,n, 

where rj(t) models the population mean function of the stochastic process, 
Vi(t) is the ith individual variation (subject-effect) from rj(t), Ei(t) is the ith 
measurement error process and Ui(t) is the ith response process. Without 
loss of generality, throughout this paper we assume the stochastic process 
has finite support, that is, t £ T = [a,b], —oo<a<b<oo. Moreover, we 
assume V{(t) and £i(t) are independent, and are independent copies of v(t) ~ 
SP(0, 7) and e(t) ~ SP(0, j e ), %(s, t) = <7 2 (t)l{ s= t}, respectively, where and 
throughout SP(r/,7) denotes a stochastic process with mean function r](t) 
and covariance function j(s,t). It follows that the underlying individual 
functions (trajectories) fi{t) = E{yi(t)\vi(t)} = r](t) +Vi(t) are i.i.d. copies of 
the underlying stochastic process, f(t) = rj(t) + v(t) ~ SP(r7, 7). In practice, 
functional data are observed discretely. Let = 1, 2, . . . , n,j, be the design 
time points of the ith subject. Then by (1.1) and letting yij = yi(Uj) and 
£jj = w e have 

(1.2) yij = r](t ij ) + Vi(ti j ) + e ij , j = l,2,...,ra;; i = l,2,...,n. 

In many practical situations, the above discrete functional data (1.2) have to 
be first registered before any statistical inference can be conducted. Methods 
for curve registration can be found in Kneip and Gasser [19], Kneip and 
Engel [18], Silverman [26], Ramsay and Silverman ([23], Chapter 5), Ramsay 
and Li [22] and Ramsay and Silverman ([24], Chapter 7), among others. In 
this paper, for convenience, we assume that the functional data (1.2) do not 
need registration or have been registered. 

Estimation of the population characteristics rj(t),j(s,t) and cr 2 (t) of the 
model (1.1) has been the focus of FDA in the literature. Most of the existing 
approaches involve one smoothing method or another. For example, Besse 
and Ramsay [3], Ramsay [20] and Ramsay and Dalzell [21] made use of re- 
producing kernel Hilbert space decomposition; Rice and Silverman [25] and 
Brumback and Rice [4] employed smoothing splines; Besse, Cardot and Fer- 
raty [2] used B-splines; Hart and Wehrly [16] employed kernel smoothing; 
and Kneip [17] studied a principal components-based approach. Develop- 
ment of significance tests about r\ (t) and other population characteristics of 
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the model (1.1) is more important and challenging. Faraway [13] discussed 
the difficulties in extending some multivariate hypothesis testing procedures 
to FDA. Ramsay and Silverman [23] suggested a pointwise i-test or F-test 
but they did not discuss global tests. For curve data from stationary Gaus- 
sian processes, Fan and Lin [11] developed an adaptive Neyman test. 

In this paper we adopt the method of "smoothing first, then estimation" 
for functional data. That is, we construct the estimators for r)(t),j(s,t) and 
a 2 (t) using the reconstructed individual functions fi(t),i = l,2,...,n, ob- 
tained using one smoothing method or another; in particular, in this paper 
we use the local polynomial kernel (LPK) smoothing technique as described 
in [10], among others. The idea of "smoothing first, then estimation" itself 
is hardly new since it has been used in the literature; see [23, 24] and the 
references therein. What is new here is that we investigate the effect of the 
substitution of the underlying individual functions fi(t),i = 1,2, ...,n, by 
their LPK reconstructions in FDA. We show that, under some mild condi- 
tions, the effect of such a substitution is asymptotically ignorable in FDA. 
Based on this, we derive the asymptotics of the estimators j){t),^(s,t) and 
& 2 (t). In particular, under some mild conditions, we show that: (1) i)(t) and 
7(s,t) are -y/n-consistent and asymptotically Gaussian; (2) the asymptotic 
efficiency of fj(t) will not be affected by better choice of the bandwidth than 
the bandwidth selected by a GCV rule; and (3) the convergence rate of a 2 (t) 
is affected by the convergence rate of the LPK reconstructions. More details 
about these results are given in Section 2. 

In the model (1.1) the only covariate for the mean function rj(t) is time. 
In many applications rj(t) may also depend on some time-independent co- 
variates and can be written as ry(i;x) = x r /3(i), where the covariate vector 
x= [x±, . . . , x q ] T and the unknown but smooth coefficient function vector 
(3(t) = [(3i(t),...,(3 q (t)] T . A replacement of r/(t) by r/(i;xj) = xf/3(i) in (1.1) 
leads to the so-called functional linear model 

(1.3) y i (t) = xj/3(t) + v i (t)+e i (t), i = l,2,...,n, 

where yi(t),Vi(t) and £i(t) are the same as those defined in (1.1). The ignor- 
ability of the substitution effect is also applied to the LPK reconstructions 
fi(t) of the individual functions fi(t) = x^/3(i) + Vi(t) of the above model. 
Based on this, we construct the estimators j3(t) and *f(s,t) and investigate 
their asymptotics; in particular, we show that (3(t) is -^/n-consistent and 
asymptotically Gaussian. Moreover, we propose a global L 2 -norm-based test 
statistic T n to test a general hypothesis testing problem about the covariate 
effects (3(t); its asymptotic random expression is derived. More details about 
these results are given in Section 3. 

The rest of the paper is organized as follows. In Section 4 we present a 
simulation study which aims to investigate the effect of the bandwidth choice 
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on the accuracy of the LPK reconstructions fi(t) and the mean function 
estimator fj(t). In Section 5 we illustrate the proposed methodologies by 
applying them to a real functional data set collected in climatology. Finally, 
in Section 6 technical proofs of some asymptotic results are outlined. 

2. Basic methodologies. 

2.1. LPK reconstruction of individual functions. First of all, we describe 
how to reconstruct the individual functions fi(t),i = 1,2,..., re, using the 
LPK smoothing technique based on the standard nonparametric regression 
model 

(2.1) Vij = fi{kj) + eij, j = l,2,...,m; i = l,2,...,n. 

For any fixed time point t, assume fi(t) has a (p+ l)th continuous derivative 
in a neighborhood of t for some positive integer p. Then by Taylor's expan- 
sion, fi(Uj) can be locally approximated by a p-order polynomial, that is, 

MUi) ~ m) + fa - t)fi 1] {t) + ■■■ + (u, - t) p fi P \t)/ P \ = zJ j(Xi , 

in the neighborhood of t, where Qj = [aw, an, . . . , ai p ] T with ai r = f i (t)/r\, 
and Zy = [1, tij — t, . . . j (tjj — t) p ] T . Then the p-order LPK reconstructions of 
are defined as fi(t) = ctjo = e i^ p +iQ:i; where and throughout e rjS denotes 
the s-dimensional unit vector whose rth component is 1 and others are 0, 
and aci are the minimizers of the weighted least squares criterion 

n rii 
i=l j=l 

(2.2) 

= - ZjQ!i) T K i/l (yj - ZiOii), 

2 = 1 

where = [y a , . . .,y in J T ,Zi = [z a ,. . .,z in J r and K ih = diag(K h (tn -t), 
Kh(tim with Kh(-) = K{- /h)/h, obtained by rescaling a kernel func- 

tion K(-) (often a symmetric p.d.f.) with bandwidth h > that controls 
the size of the associated neighborhood. Minimizing (2.2) with respect to 
oci, i = 1, 2, . . . ,re, is equivalent to minimizing the iih term in the summation 
on the right-hand side of (2.2) with respect to ctj for each i = 1, 2, . . . , n. It 
follows that for i = 1, 2, . . . , n, 

< 2 ' 3 > 

3=1 
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where K ni (t) are known as the empirical equivalent kernels for the p-order 
LPK; see Fan and Gijbels [10]. 

In (2.2) different bandwidths may be used for different individual func- 
tions. However, the individual functions in a functional data set are i.i.d. reali- 
zations of a stochastic process, and hence, often admit similar smoothness 
properties and sometimes similar shapes [17, 18]; it is then reasonable to 
treat them in the same way, for example, using a common bandwidth for all 
of them. The advantages in using a common bandwidth at least include the 
following: (a) reduce the computational effort for bandwidth selection; and 
(b) simplify the asymptotic results of the estimators. 

For convenience, we define the following widely-used functionals of a ker- 
nel K: 

B r (K) 

(2.4) V(K) 

K®(t) 

For estimating a function instead of derivatives, Fan and Gijbels [10] pointed 
out that even orders are not appealing. Therefore, throughout this paper, we 
assume p is an odd integer; moreover, we denote 7fc,z(s,i) as the (k, Z)-times 

partial derivative of ^y(s,t), that and denote V as the 

set of all the design time points tij,j = 1, 2, ... ,71$; i = 1, 2, . . . ,n. In addition, 
we denote G\jp(l) [resp. oup(l)] as "bounded (resp. tends to 0) in probability 
uniformly for any t within the interior of T and all i = 1, 2, . . . ,ra." Finally, 
the following regular conditions are imposed. 

Condition A. 

1. The design time points Uj, j = 1, 2, ... ,m; i = 1,2, . . . ,n, are i.i.d. with 
p.d.f. 7r(-) which has the bounded support T = [a, b]. For any given t 
within the interior of T, 7r'(t) exists and is continuous over T. 

2. Let s and t be any two interior time points of T. The individual functions 
fi(t),i = 1,2, ... ,n, and their mean function rj(t) have up to (p + l)-times 
continuous derivatives. Their covariance function 7(5, t) has up to (p + 1)- 
times continuous derivatives for both s and t. The variance function of 
the measurement errors, <J 2 (t), is continuous at t. 

3. The kernel K is a bounded symmetrical p.d.f. with bounded support 
[-1,1]. 

4. There are two positive constants C and 5 such that rii > Cn 5 , for all 
i = 1, 2, . . . , n. As n — > 00, we have h — > and n s h — > 00. 



J K(t)fdt, 

J K{tfdt, 

J K{s)K{s + t)ds. 
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Remark 1. For some practical functional data sets, Condition A4 may 
be too restrictive. For example, a functional data set with a few individual 
functions having n., < Cn* 5 does not satisfy Condition A4. However, such 
a functional data set can often be slightly modified to satisfy Condition 
A4. A simple way of doing so is to drop those individual functions having 
rii <Cn s so that the remaining individual functions form a new functional 
data set which satisfies Condition A4. This procedure will not result in less 
efficient estimators when h/n — > and will not affect the consistency of the 
estimators when (n — n) — > oo, where h is the number of dropped individual 
functions, which may be bounded or tend to oo as n — > oo. 

Using Lemma 3 in Section 6, it is easy to show the following. 

Theorem 1. Assume Condition A is satisfied. Then the average con- 
ditional MSE (mean squared errors) of the p-order LPK reconstructions 
fi(t),i = 1,2,..., n, is 



where K* is the equivalent kernel of the p-order LPK f[10], page QA), and 
m=(n- 1 E?=inr 1 ) -1 - 

Remark 2. On the left-hand side of (2.5) the notation E{-|P} de- 
notes the conditional expectation when all the design time points t«,j = 
1,2,..., n^; i = 1, 2, . . . , n, are given. Nevertheless, the leading term on the 
right-hand side of (2.5) is independent of D and hence, the left-hand side is 
nearly unconditional. For technical convenience and following the literature 
tradition (e.g., [10]), we keep using the "conditional expectation" notation 
E{-|2?} here and throughout. This remark applies to all other statistical 
operations conditional to T> given in this paper. 

Theorem 1 indicates that the optimal bandwidth of the p-order LPK re- 
constructions fi(t) is h = Op(m~ 1 /( 2p+3 )) = Op(n~ <5 /( 2p+3 )). Using Lemma 3 
again, we can show the following. 

Theorem 2. Assume Condition A is satisfied. Then for the p-order 
LPK reconstructions fi(t) using the bandwidth h = 0(n~ s ^ 2p+ ^), we have 




(2.5) 





(2.6) 



fi(t) = fi(t) + n-^+W^Ouptl) 



i = 1, 2, . . . ,n. 
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Theorem 2 implies that, under the given conditions, the LPK reconstruc- 
tions fi(t) are asymptotically uniformly little different from the underlying 
individual functions fi(t). We expect that this is true not only for LPK but 
also for any other linear smoothers, for example, smoothing splines [14, 27], 
regression splines or orthogonal series [7], among others. 

2.2. Estimation of the mean and covariance functions. It is then natu- 
ral to estimate the mean function r](t) and the covariance function j(s,t) 
by the sample mean and sample covariance functions of the p-order LPK 
reconstructions fi(t), 

n 

fi(t) = n- 1 J2Mt), 

i=l 

(2.7) 

mm) = (« - ij-'Ei/iW - <?(»)}{/<(*) - «*)}. 

i=l 

The asymptotic conditional bias, covariance and variance for fj(t) are given 
below. 



Theorem 3. Assume Condition A is satisfied. Then as n — > oo, the 
asymptotic conditional bias, covariance and variance of fj(t) are 



Bias{f)(t)\V} 

-^ +1 [1 + op(1)], 



B p+1 (K*)r,(P+V(t) up+li 



(p+1)! 
Cov{fj(s),fj(t)\V} 



. .. K*W (a - t ) h }a 2 (s) . 

7T(t) 



+ 



^P+i(-K'*)[7p+i,o(g ) *) + 7o,p+i(a.*)] n -i /lP +ij[ 1 ! Qp ^ 



\&r{fi(t)\V} 



. , . V{K*)a 2 {t) f _ , 

7T(t) 

ffp+i(^*)[7 P +i,o(M) +7Qj>+i(*.*)] ra - 



^ +1 }[1 + op(1)]. 



(P + 1)! 

Remark 3. Under Condition A and by Theorem 3, we have 
(2.8) MSE{t?(£)|P} = 7 (t, t)/n + Oup{/i 2(p+1) + (nm/i)" 1 + n" 1 /^ 1 }. 
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We then always have MSE{f)(t)\T>} 
(2.9) rhh— >oo, 



= r y(t,t)/n + o\jp(l/n), provided that 

n/l 2( P +l) ^ 0. 



Remark 4. Condition (2.9) is satisfied by any bandwidth h = 0(n s *) 
with l/[2(p + 1)] < 5* < S. In particular, it is satisfied by the optimal band- 
width, h = 0(n~ 5// ( 2p+3 )), for the p-order LPK reconstructions fi(t) when 
5 > 1 + l/[2(p + 1)]. In this case, the p-order LPK reconstruction optimal 
bandwidth is sufficiently small to guarantee the -y/n-consistency of fj(t). Con- 
dition (2.9) is also satisfied by the optimal bandwidth, h = 0(n~ ( - 1+s '^ 2p+3 ^) 
(when 1 + l/[2(p + 1)] < 5 < 1 + \/{p + 1)) or h = 0(n" 5 /( p+2 )) [when 5 > 
1 + l/(p + 1)], for f/(t). It follows that, in both cases, the optimal band- 
width admits the same asymptotic efficiency for estimating rj(t) because 
MSE{v^r}(i)|D}^7(i,t) as n->oo. 

By pretending all the underlying individual functions fi(t) were observed, 
the "ideal" estimators of rj(t) and j(s,t) are 

n 

(2.10) ^ n 

7( S , t) = (n - I)" 1 £{/,(*) - v(s)HMt) ~ V(t)}- 
i=l 

Theorem 4. Assume Condition A is satisfied, and the bandwidth h = 
0(n~ s ^ 2p+ ^) is used for the p-order LPK reconstructions fi(t). Then as 
n — > oo, we have 

fj(t) = fj(t) + n-to+Wto^OupQ), 

(2.11) 

7 ( S ,t)=7(s,t)+n"(P +1 W( 2p+3 )Oup(l)- 
In addition, assume 5 > 1 + l/[2(p + 1)]. Then as oo, we have 

v^{r?(t)-7?(t)}~AGP(0,7), 

(2.12) 

v^{7M)-7M)}~AGP(0,7*), 

where AGP(?7,7) denotes an asymptotic Gaussian process with mean func- 
tion rj(t) and covariance function 7(5, t), and 

(2.13) 7*{(si,ti) > (s2,t2)} = E{ui(si)ui(ti>i(s2>i(t 2 )}-7(si,ti)7(s2 ) t 2 ), 

with v\{t) denoting the subject effect of the first individual function f\{t) as 
defined in (1.1). When the subject effect process v(t) is Gaussian, 

r Y*{(si,t 1 ),(s 2 ,t 2 )} = 7( s i,*2)7(s2,ii) +7(51^2)7^1 ,h)- 
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Theorem 4 indicates that, under some mild conditions, the proposed es- 
timators (2.7) are asymptotically identical to the "ideal" estimators (2.10). 
The required key condition is 5 > 1 + l/[2(p + 1)]. It follows that, to make 
the measurement errors ignorable via LPK smoothing, we need the number 
of measurements, n$, for all the subjects (or a large number of subjects; see 
Remark 1 for discussion) to tend to infinity slightly faster than the number 
of subjects, n. 

2.3. Estimation of the noise variance function. The noise variance func- 
tion a 2 (t) measures the variation of the measurement errors £y of the model 
(1.2). Following Hall and Marron [15] and Fan and Yao [12], we can con- 
struct a p-order LPK estimator of a 2 (t) based on the p-order LPK residuals 
Sij = Uij — fi(Uj)i although our setting is more complicated. As expected, 
the resulting p-order LPK estimator of cr 2 (t) will be consistent, but its con- 
vergence rate will be affected by that of the p-order LPK reconstructions 
fi(t),i = l,2,...,n. 

As an illustration, let us consider the simplest LPK estimator, that is, the 
kernel estimator for o~ 2 (t) based on § 



(2.14) a 2 (t) - 



y - 

SjU S£i H b (t iS - t)ej 



Y,UY, n jUH b {tij-t) ' 

where Hb(-) = H{- /b)/b with the kernel function H and the bandwidth b. 

Pretending = £jj , by standard kernel estimation theory (Wand and 
Jones [28] and Fan and Gijbels [10], among others), the optimal bandwidth 
for <r 2 (i) is b = Op(N~ 1 ^ 5 ), where N = Ya=i n i denotes the total number of 
measurements for all the subjects, and the associated convergence rate of 
<7 2 (i) is Op(N~ 2 /^). However, for the current setup, this convergence rate 
will be affected by the convergence rate of the p-order LPK reconstruc- 
tions fi(t),i = 1,2, . . . ,n, since under Condition A and by Theorem 2, we 
actually only have e« = e« + n~( p+1 ^ / '( 2p+3 )Oup(l)- F° r convenience, let 
u x {t) = E[e 2 (t)} = a 2 (t) and u 2 {t) = Var[e?(i)]. 

Theorem 5. Assume Condition A is satisfied and the p-order LPK 
reconstructions fi(t) use a bandwidth h = 0(n~ <5 ^ 2p+3 ^). In addition, assume 
v[(t) and V2{t) exist and are continuous at t 6T, and the kernel estimator 
(J 2 (t) uses a bandwidth b = 0(N~ 1 ^ 5 ). Then we have 

(2.15) a 2 (t) = a 2 (t) + UP (n- 2 ( 1+5 )/ 5 + n -b+W(2p+3)). 

By the above theorem, it is seen that when 5 < 2(2p + 3)/(p — 1), the 
second order term dominates the first order term; and in particular, whenp = 
1, we have a 2 (t) = a 2 (t) + Oup(n~ 2<5/5 ). In this case the optimal convergence 
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rate of a 2 (t) is not attainable. It is attainable only when 5 > 2(2p+3)/(p — 1) , 
so that the first order term in (2.15) dominates the second order term. 
This is the case only when p > 3. When p = 3, 5 > 9 is required; and when 
p = 2k + 1 — > oo, <5 > 4 is required. Therefore, it is usually difficult to make 
the convergence rate of & 2 (t) unaffected by the convergence rate of the p- 
order LPK reconstructions fi(t). 

2.4. Bandwidth selection. Theorem 1 suggests that we can choose a good 
bandwidth for the p-order LPK reconstructions fi(t), i = 1, 2, . . . , n, using the 
generalized cross-validation (GCV) score 

n 

(2.16) GCV(/i) = n~ 1 ^GCV J (/i), 

i=i 

where GCVj(/t) is the GCV score of the iih p-order LPK reconstruction fi(t). 
Let Aj be the smoother matrix of the ith. subject constructed using (2.3). 
Then we have y» = ^iYi and GCVi(h) = yf(I ni - A;) T (I ni - A^y^/fl - 
tr(Aj)/nj] 2 , where y { = [y a , . . .,y ini ] T , Yi = [vn, ■ ■ ■ ,&nJ T and tr(S) denotes 
the trace of the matrix S. In practice, the optimal bandwidth h* can be 
obtained via minimizing GCVQi) over a number of bandwidth candidates 
of interest. Theoretically, it is expected that h* = Op{n~ 5 ^ 2p+ ^). 

Remark 4 states that, under the required conditions, the optimal band- 
width for fj(t) and the optimal bandwidth for the p-order LPK reconstruc- 
tions fi(t) admit the same asymptotic efficiency for estimating r/(t). There- 
fore, it is generally sufficient to use h* for estimating rj(t) although, for finite 
samples, better bandwidth choices for fj(t) are possible. 

3. Functional linear models. Notice that Theorem 2 is also applied to 
the p-order LPK reconstructions fi (t) of the underlying individual functions 
fi(t) =xjf3(t) + Vi(t),i = 1,2,..., re, of the functional linear model (1.3). 
This property can be used to do inference about the model (1.3). In this 
section we focus on the estimation and significance tests of the coefficient 
function vector (covariate effects) f3(t) of the model. 

3.1. Coefficient function estimation. Let f (t) = [fi(t), . ■ . , fn{t)] T and X = 
[xi, . . . ,x n ] T . Throughout this paper we assume X has full rank. Then the 
least-squares estimator of (3(t) is 

{n \ ~ 1 n 

]>>xf £x^(*)=( xTx r lxT f(*)> 
i=l J i=l 

which minimizes Q(fi) =n~ 1 J2i=iJ[fii t ) -xf(3(t)] 2 dt. It follows that the 
subject-effects Vi(t) can be estimated by Vi{t) = fi(t) — xjf3(t) and their 
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covariance function j(s,t) can be estimated by 

n 

j(s,t) = (n- <?) -1 ^£i(s)£i(£) 
i=i 

(3.2) 

= (n-q)- 1 v(s) T v(t), 

where v(t) = M^ 2 (t), . . . ,T)„(t)] T = f (t) -X(X r X)~ 1 X r f (t) = (I n -P)f (i) 
and P = X(X r X) 1 X T is a projection matrix with P T = P,P 2 = P and 
tr(P) =q. 

Pretending fi(t),i = 1,2, is known, the "ideal" estimators of /3(i) 
and 7(s, i) are 

0(t) = (X T X)- 1 X T f(t), 

(3.3) 

7(s,t) = (n-g)- 1 v(s) T v(t), 

where f(t) = [/i(t), . . . , f n (t)] T and v(t) = (I„ - P)f(t). It is easy to show 
that E/3(t) = and E7(s, t) = 7(5, t). For further investigation, we impose 
the following conditions. 

Condition B. 

1. The covariate vectors x$, i = 1,2, ... ,n, are i.i.d. with finite and invertible 
second moment Exixf = f2; moreover, they are uniformly bounded in 
probability; that is, Xj = C\jp(l). 

2. The subject-effects fi(t) are uniformly bounded in probability; that is, 
v i (t) = UP (l). 

Theorem 6. Assume Conditions A and B are satisfied, and the p-order 
LPK reconstructions fi(t) use a bandwidth h = 0(n- s /( 2 P+V). Then as n — > 
oo, we have 

P(t) = 0(t) + n-( p+l W( 2 P + V0 VP (l), 

(3.4) 

7 (s, t) = %s, t) + n-( p+1 ) 5 /( 2p+3 )0 UP (l). 
In addition, assume 5 > 1 + l/[2(p + 1)]. Then as n — ► oo, we /iai>e 

(3.5) ^{/9(t)-/3(t)}~AGP(0 )7/9 ), 
where ^p{s, t) = j(s, t)^ 1 . 

Theorem 6 implies that, under the given conditions, the proposed estima- 
tors j3(t) and ^(s,t) are asymptotically identical to the "ideal" estimators 
0(t) and 7 (s,i), respectively. Therefore, in FDA it seems reasonable to di- 
rectly assume the underlying individual functions are "observed" as is done 
in [23, 24]. The asymptotic result stated in (3.5) is a foundation for signifi- 
cance tests of the covariate effects. 



12 



J.-T. ZHANG AND J. CHEN 



3.2. Significance tests of the covariate effects. Consider the general hy- 
pothesis testing problem 

(3.6) H :C(3(t)=c(t), vs. H x : C/3(i) ^ c(t), 

where t G T = [a, 5] , C is a given kx q full rank matrix, and c(t) = [ci (i) , . . . , 
c k(t)] T is a given vector of functions. In order to check the significance of the 
rth covariate effect, one takes C = ej q = [0, . . . , 0, 1, 0, . . . , 0] and c(t) = 0; in 
order to check if the first two coefficient functions are the same, that is, 
/^(f) =(3 2 (t), one takes C = (e 1)? - e 2j(? ) T = [1, -1, 0, . . . , 0] and c(t) = 0. 
It is natural to estimate C(3(t) by C(3(t). By Theorem 6, we have 

(3.7) \MQ3(i) -c(t)] ~ AGPfa c , 7c ), 

where rj c (t) = y/n[C{3(t) - c(<)] and 7 c (s, i) = 7(3, t)Cfi -1 C T . Let 
w(t) = {C(X r X)- 1 C r }- 1 / 2 [C^(t) - c(t)] 
= Mi),...,™ fe (i)] T 



(3.8) 



Since X T X/n — > as n — ► 00, using (3.7), we can show that w(i) ~ AGP(?7 1U , 
7 W ), where 

Vw (t) = V^(cn- 1 c T )- 1 / 2 [cm - c(t)} 

(3.9) = [r] w i(t),...,Vwk{t)] T , 

where denotes the identity matrix of size k. It follows that the components 
wi(t),... ,Wk(t) are independent asymptotic Gaussian processes with mean 
functions r] w i(t), . . . , r] w k(t), respectively, and a common covariance function 
7(s, t). That is, 

(3.10) «;,(*) ~AGP(ik,,, 7), 1 = 1,2,..., k. 

Based on these results and with C and c(t) properly specified, pointwise t 
and F-tests for the coefficient functions f3\(t), . . . , P q (t) can easily be con- 
ducted ([23], Chapter 9). We here propose the following global test statistic 
for the general hypothesis testing problem (3.6): 

(3.11) T n = [ b \\w(t)\\ 2 dt = Y, fw 2 {t)dt, 

J a l—\ a 

where || • || denotes the usual L 2 -norm. Let T n be the associated "ideal" 
global test statistic, obtained by replacing j3(t) by the "ideal" estimator 
/3(t) as defined in (3.3). 
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To derive the asymptotic random expression of T n , we assume that j(s,t) 
has finite trace, that is, tr(7) = J a ^(t, t) dt < oo. Let Ai, A2, ... be the eigen- 
values, in decreasing order, and 4>i(t),4>2(t), ... be the associated orthonor- 
mal eigenfunctions of j(s,t). Let m denote the number of positive eigenval- 
ues. When all the eigenvalues are positive, we let m = 00. Then A r > for 
r <m and A r = for all r > m. Since tr(7) < 00 implies f£ J a 7 2 (s, t) ds dt < 
00 by the Cauchy-Schwarz inequality, the covariance function j(s,t) has the 
singular value decomposition ([27], page 3) 

m 

(3.12) 1 (s,t) = Y / KMs)Mt), s,teT=[a,b}. 

r=l 



Theorem 7. Assume the conditions of Theorem 6 are satisfied. Then 
as n — > 00, we have 

(3.13) T n = f n + n i/2-(P+iW(2p+3) 0p(1) . 

In addition, assume 5 > 1 + l/[2(p + 1)] and j(s,t) has finite trace so that 
it has the singular value decomposition (3.12). Then as n — > 00, we have 

m 

(3.14) T n ±Y, X r A r + op(l), A r ~xl(u 2 r ), 

r=l 

where X = Y means the random variables X and Y have the same distribu- 
tion, xt denotes a x 2 -distribution with k degrees of freedom and the non- 
central parameters 



(3.15) uf. = X' 1 



rj w (t)Mt)dt 



a 



Under Hq, T] w (t) = so that all the uj, are 0. 

Theorem 7 suggests that the distribution of T n is asymptotically the same 
as that of a x 2- tyP e mixture. There are three possible methods that can be 
used to approximate the null distribution of T n : ^-approximation, simula- 
tion and bootstrapping. In the first two methods, we approximate the null 
distribution of T n by that of the x 2 ~type mixture S = YLT=i KA r , where 
Ar ~ Xfej are the eigenvalues of 7(s,t) and rh is some well-chosen inte- 
ger such that the eigenvalues A r ,r = l,2,...,m, explain a sufficiently large 
portion of the total variation tr(-y) = Y^=i and A r , r = rh + 1, m + 2, . . . , 
are essentially 0. Besse [1] proposed a simple method for selecting such an 
rh. A simple and natural choice of rh is the number of positive eigenvalues 
of 7(5, t). We found that the second method worked well in our simulation 
study and in the real data application presented in the next two sections. 
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In the ^-approximation method, the distribution of S is approximated by 
that of a random variable R = ax^ + via matching the first three cumu- 
lants of R and S to determine the unknown parameters a,d and (3 [5, 29]. 
In the simulation method, the sampling distribution of S is computed based 
on a sample of S obtained via repeatedly generating {A\, A 2 , ■ ■ ■ , Afi)- The 
bootstrap method is slightly more complicated. In the bootstrap method, 
we generate a sample of subject effects v*(t),i = 1,2,..., re, from the esti- 
mated subject effects = 1,2, . . . ,re, under H\ and then construct a 
bootstrap sample, f*(t) = xjp (i) + v*(t),i = 1, 2, . . . , re, where f3o(t) is the 
estimator of (3(t) under Ho so that C(3 (t) =c(t). Let (3 (t) be the boot- 
strap estimator of (3(t) based on the above bootstrap sample. We then use 
it to compute 

T*= f\\^(t)\\ 2 dt = j^ [ b ™f(t)dt, 

where w*(i) can be obtained by replacing /3(t) with /3 (t) in the definition 
(3.8) of w(t). The bootstrap null distribution of T n is obtained by the sam- 
pling distribution of T* via B replications of the above bootstrap process 
for some large B, for example, B = 10,000. 

4. A simulation study. In this section we aim to investigate the effect 
of the bandwidth selected by the GCV rule (2.16) on the average MSE 
(2.5) of the p-order LPK reconstructions fi(t),i = 1,2, . . . ,re, and the MSE 
of the mean function estimator fj(t) via a simulation study. We generated 
simulation samples from the model 

y i {t)=r](t) + v i {t) + e i {t), 

rj(t) =a + ai(pi(t) + a 2 ^2(i), 

Vi(t) = b i0 + b il il)i(t) + bi2ip2{t), 

bi = [bio,ba,b i2 ] T ~ iV[0,diag(cro, 0-1,0-2)], 

£i(t)~N[0,a 2 £ (l+t)}, a = 1,2,..., re, 

where n is the number of subjects and bj and £j(t) are independent. The 
scheduled design time points are tj = j/(m + l),j = 1, 2, . . . , m. To obtain 
an unbalanced design which is more realistic, we randomly removed some 
responses on a subject at a rate r m i ss so that on average there are about 
m(l — r m i ss ) measurements on a subject, and nm(l — r m i ss ) measurements in 
a whole simulated sample. For simplicity, in this simulation the parameters 
we actually used are [00,01,02] = [1.2, 2.3, 4.2], [<Tq, g\, a 2 , a 2 } = [1,2,3,0.1], 
4>i (t) = ipi(t) = cos(2irt), 4>2{t) = ip2(t) = sm(2-Kt),r m i ss = 10%, m = 40 and 
re = 20, 30 and 40. 
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For a simulated sample, the p-order LPK reconstructions fi(t) were ob- 
tained using a local linear (i.e., p = 1) smoother [8, 9] with the well-known 
Gaussian kernel. We considered five bandwidth choices, 0.5/i*, 0.8h* , h* , 1.25/t* 
and 2/i*, where h* is the bandwidth selected by the GCV rule (2.16). For a 
simulated sample, the average MSE for fi(t) and the MSE for the mean func- 
tion estimator fj(t) were computed respectively as MSEj = (nM)~ 1 J27=i 

EjiAkrj)- h(r,)} 2 and MSE, = M" 1 EjLdvfa) -vh)} 2 , where n 

tm are M time points equally-spaced in [0,1], for some large M, for example, 
M = 400. 

Figure 1 presents the simulation results. The boxplots were based on 200 
simulated samples. From left to right, panels are respectively for GCV, MSEj 
and MSE,; from top to bottom, panels are respectively for n = 20,30 and 
40. In each of the panels, the first five boxplots are associated with the five 
bandwidth choices: 0.5/t*, 0.8/i*, h*, 1.25/t* and 2h* , respectively; the sixth 
boxplot in each of the MSE, panels is associated with the "ideal" estimator 
fj(t); see (2.10) for its definition. 

From Figure 1, we may conclude that (a) overall, the GCV rule (2.16) 
performed well in the sense of choosing proper bandwidths to minimize the 
average MSE (2.5); (b) bandwidths smaller than h* help reduce the MSE, 
but do not by much, while bandwidths larger than h* do enlarge MSE, 
substantially; and (c) the MSE, based on fj(t) and those based on the "ideal" 
estimator fj(t) are nearly the same unless the bandwidths are substantially 
larger than h* . 

5. Application to the Canadian temperature data. The Canadian tem- 
perature data (Canadian Climate Program [6]) were downloaded from 
ftp://ego.psych.mcgill.ca/pub/ramsay/FDAfuns/Matlab/ at the book 
website of Ramsay and Silverman [23, 24]. The data are the daily tempera- 
ture records of 35 Canadian weather stations over a year (365 days), among 
which 15 are in Eastern, another 15 in Western and the remaining five in 
Northern Canada. This is a typical functional data set with the number of 
measurements per subject (rij = 365) being much larger than the number 
of subjects (n = 35). We shall use this functional data set only to illustrate 
the methodologies developed in this paper. For a more formal analysis, this 
functional data set should be first registered using either a parametric curve 
registration method proposed by Silverman [26] or a more flexible nonpara- 
metric curve registration method developed by Ramsay and Li [22]. Our 
methodologies can then be applied similarly to the resulting registered func- 
tional data set. 

Figure 2 presents the individual curve reconstructions of the Canadian 
temperature data. These reconstructions were obtained by applying the local 
linear (p = 1) kernel fit [8, 9] with the well-known Gaussian kernel to the 
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Fig. 1. Simulation results. From left to right, panels are, respectively, for GCV, MSEf 
and MSE V ; from top to bottom, panels are, respectively, for n = 20,30 and 40. In 
each of the panels, the first five boxplots are associated with the five bandwidth choices 
0.5h* , 0.8h* , h* , 1.25/i* and 2h* , where h* is the GCV bandwidth; the sixth boxplot in a 
MSE V panel is associated with the "ideal" estimator fj(t). 



individual temperature records of each of the 35 weather stations, but with 
a common bandwidth h* = 2.79, selected by the GCV rule (2.16). It can 
be seen that the Eastern weather station temperature curves (solid) mix 
up with the Western weather station temperature curves (dot-dashed), but 
most of the Eastern and Western weather station temperature curves stay 
higher than the Northern weather station temperature curves (dashed). This 
is reasonable since the Eastern and Western weather stations are located at 
about the same latitudes, while the Northern weather stations are located 
at higher latitudes. 
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Fig. 2. Local linear (p — 1) individual curve reconstructions of the Canadian temper- 
ature data with the bandwidth h* = 2.79, selected by GCV. Eastern weather stations: 
solid curves; Western weather stations: dot-dashed curves; and Northern weather stations: 
dashed curves. 



We then modeled the Canadian temperature data set by the functional 
linear model (1.3) with the covariates 

' [1,0, 0] T , if weather station i is located in Eastern Canada, 

_ ( [0,1, 0] T , if weather station i is located in Western Canada, 

[0,0, 1] T , if weather station i is located in Northern Canada, 

^ i = 1, 2, . . . , 35, 

and the coefficient function vector (3{t) = [(3i(t), f32(t), /3s(t)] T , where f3±(t), 
/?2 (i) and (5^{t) are the covariate effect (mean temperature) functions of the 
Eastern, Western and Northern weather stations, respectively. 

Figure 3 superimposes the estimated mean temperature functions of the 
Eastern, Western and Northern weather stations, together with their 95% 
standard deviation bands. Based on the 95% standard deviation bands, 
some informal conclusions can be made. First of all, over the whole year 
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Fig. 3. Estimated mean temperature functions of the Eastern, Western and Northern 
weather stations with 95% standard deviation bands (Eastern weather stations, solid; West- 
ern weather stations, dot-dashed; and Northern weather stations, dashed). 



([a,b] = [1,365]), the differences between the mean temperature functions of 
the Eastern and the Western weather stations are much less significant than 
the differences between the mean temperature functions of the Eastern and 
the Northern weather stations, or between the Western and the Northern 
weather stations. This is because the 95% standard deviation band of the 
Eastern weather station mean temperature function covers (before Day 151) 
or stays close (after Day 151) to the mean temperature function of the 
Western weather stations; however, the 95% standard deviation bands of 
the Eastern and Western weather station mean temperature functions are 
far away from the mean temperature function of the Northern weather sta- 
tions. Second, the significances of the differences between the mean temper- 
ature functions of the Eastern and the Western weather stations for different 
seasons are different. During the Spring (usually defined as the months of 
March, April and May or [a, b] = [60, 151]), the mean temperature functions 
are nearly the same, but this is not the case during the Summer (June, July 
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Table 1 

Significance test results for the differences of the mean temperature functions of the 
Eastern and Western weather stations based on 10,000 replications 



P-values 



[a, b] 


h 


T n 


X^-approximation 


Simulation 


Bootstrapping 


[1,365] 


h*/2 


59954 


0.179 


0.179 


0.166 


(Whole year) 


h* 


58248 


0.185 


0.181 


0.180 




2h* 


56868 


0.189 


0.185 


0.184 


[60, 151] 


h*/2 


945 


0.842 


0.836 


0.834 


(Spring) 


h* 


656 


0.940 


0.874 


0.877 




2h* 


378 


1.000 


0.923 


0.922 


[152,243] 


h* 12 


6625 


0.078 


0.075 


0.068 


(Summer) 


h* 


6432 


0.082 


0.084 


0.083 




2h* 


6322 


0.085 


0.086 


0.075 


[244, 334] 


h*/2 


28748 


0.011 


0.011 


0.009 


(Autumn) 


h* 


28303 


0.012 


0.013 


0.008 




2h* 


27526 


0.014 


0.015 


0.010 



and August or [a,b] = [152,243]) or during the Autumn (September, Oc- 
tober and November or [a, b] = [244,334]). These conclusions can be made 
more clear via the hypothesis testing problem (3.6) with t € T = [a, b] us- 
ing the global testing statistic T n (3.11) and with a,b,c and C properly 
specified. For example, to test if the mean temperature functions of the 
Eastern and Western weather stations during the Spring are the same, we 
take a = 60, b = 151, c = and C = [1, —1,0]; and to test if the mean tem- 
perature functions of the Eastern, Western and Northern weather stations 
during the Autumn are the same, we take a = 244, b = 334, c = [0,0] r and 



1, 0, 
0, 1, 



We first tested the differences of the mean temperature functions of the 
Eastern and Western Canadian weather stations for the whole year, and dur- 
ing the Spring, Summer and Autumn. Table 1 shows the significance test 
results, where the simulation and bootstrap P-values were computed based 
on 10,000 replications. For each choice of the seasonal period [a, b], we used 
three different bandwidth choices, h*/2, h* and 2h* , where h* = 2.79 was 
selected by the GCV rule (2.16). For each bandwidth choice, the associated 
test statistics T n were computed using (3.11). For each T n , we computed 
its P-value using the ^-approximation, simulation and bootstrap methods 
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Fig. 4. JVitiZ p.d.f. approximations (\ 2 -approximation, solid; simulation, dashed; boot- 
strap, dotted) of the global test statistic T n (3.11) when h* = 2.79. (a) [a, b] — [1,365]; 
(b) [a,b] = [60,151]; (c) [a,b] = [152,243]; and (d) [a, b] = [244,334]. 



which were described briefly in Section 3.2. Figure 4 displays the null prob- 
ability density function (p.d.f.) approximations obtained using the three 
methods. It seems that all three approximations perform reasonably well 
except at the left boundary where the ^-approximations seem problematic. 
Nevertheless, from the table, we can see that the significance test results 
are not strongly affected by the bandwidths used; moreover, we can see that 
the differences between the mean temperature functions of the Eastern and 
Western weather stations over the whole year (P-value > 0.166) are larger 
than their differences during the Spring (P-value > 0.834), but much smaller 
than their differences during the Summer (P-value < 0.068) or during the 
Autumn (P-value < 0.015). These results are consistent with those observed 
from Figure 3. 

Following the same procedure, we also tested the following null hypothe- 
ses: the mean temperature functions are the same between (1) the Eastern 
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and Northern; (2) the Western and Northern; and (3) the Eastern, West- 
ern and Northern weather stations for the following periods: (1) the whole 
year; (2) the Spring; (3) the Summer; and (4) the Autumn. As expected, 
we rejected all these null hypotheses with P-value 0. These results are also 
consistent with those observed from Figure 3. 

6. Technical proofs. In this section we outline the technical proofs of 
some of the asymptotic results. Before we proceed, we list the following 
useful lemmas. Proof of the first lemma can be found in [10], page 64. Notice 
that, under Condition A4, "n — ► oo" implies that "n^ — > oo." 

Lemma 1. Assume Condition A is satisfied. Then as n— > oo, we have 

K'^i - *) = z^mKiUj -t)[i + o P (i)], 

where K*(-) is the LPK equivalent kernel f[10], page 64J. 
Lemma 2. We always have 



-*)(fy-*) r = { 



1, when r = 0, 
0, otherwise. 



3=1 

Assume Condition A is satisfied. Then as n — > oo, we have 

t ^{Ui - Wi3 - t) P+1 = ^ +l( l? P+ V + 0P(1)], 
j=l \ ) 

£ W(ty - t)f = I^l^r^l + op(1)], 
£ JC(t« - Wfti - *) = [[ m }l \ mh)-\i + op(i)], 

where B r {-) and V(-) are defined in (2.4). 

Let rj(i) = — fi(t),i = 1,2,..., n, where fi(t) are the p-order LPK 
reconstructions of fi(t) given in Section 2.1. Let f(t) = n~ l Yn=i r iif) an d 
f(t) = n~ 1 Y^2=i fi(t) ■ Using Lemmas 1 and 2, we can prove the following 
useful lemma. 

Lemma 3. Assume Condition A is satisfied. Then as oo, we have 

E{riW | B} .fep^fi) ft P +1|1+op(1)1 , 
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^(^W(M) feW | [1 + op(1)]i 



(P + 1) 

Cov{r,( S ),/, (i )|P} = ^±i^±^^ +1 [l + p(l)]. 

Proof. By (2.3) and Lemma 1, we have 

ri(t) = £ " + E " " /*(*)}■ 

i=i j'=i 

It follows that 

m 

E( n (t)\v) = J2 Kpfaj - t){ v (t i3 ) - r,(t)}. 

3=1 

Applying Taylor's expansion and Lemmas 1 and 2, we have 

E(r, (t)\V) = £ K£ (Uj - t) | £ r/0 (t) + o[(t tJ - t f +1 ] 

(6.1) 

(p+1)! 

Similarly, by the independence of /i(t) and £i(i), we have 
Cov(r i (s),n(t)|I>) 

= Y. K H^-s)K^(t l] -t)a\t l3 ) 
j'=i 

j=l Z=l 

x {-y(ti j} tu) -7(%,t) -7( s )*ii) + 7( s )*)} 
A'*«[(s-t)//i]a 2 (s) 



(6.2) 



TT(t) 



-(n./i)" 1 



In particular, letting s = t, we obtain 
f y(iT> 
I W) 



V„(r i (*)P) = |^!W(n i ^ 



STATISTICAL INFERENCES IN FDA 23 

(6.3) 



+ 



gg± ^^g^ ^ +1) |[l + Q p(l)] ) 
(p+l)H J 



as desired. Lemma 3 is proved. □ 

Direct application of Lemma 3 leads to the following. 

Lemma 4. Assume Condition A is satisfied. Then as n — > oo, we have 

E«*)PH^f2^^ [1 + M1)] , 

r HW.M -if «''"'((» 
Cov(r(s), r(t)|Pj = n < p-r (mn) 



vr(i) 



^ 2 

+ 



g g± i(^±MM w -i^+i [ i + 0p( i ) ] ; 
Cp + 1)! 



Cav(f(a),/(i)|2>) 
where m = (n~^YA=i n ^ )~ 1 > as defined in Theorem 2. 



Proof of Theorem 1. For each i = 1,2, . . . , n, by (6.1) and (6.3), we 
have 



(6.4) 



mm - Mt)?\v} 

V{rKt)\V] = {E(n(t)\V)} 2 + Var(r,(t)|D) 
^ 2 +1 (^)[(^ +1 )(t)) 2 + 7p +1 , p+1 (t,t)] ^ (p+1) 



(p + l)! 2 

V(K*)a 2 (t) 



+ 



(n i / l )- 1 |[l + 0p (l)]. 



vr(t) 

Theorem 1, that is, the expression (2.5), then follows directly. □ 

Proof of Theorem 2. Under Condition A, the coefficients of /i 2 ( p+1 ) 
and (nj/i) -1 in the expression (6.4) are uniformly bounded over the finite 
interval T = [a, b]. Moreover, since rii > Cn 5 and h = 0(n _<5 /( 2p+3 )), we 
have 0(h 2 tP +1 >) =0{{n l h)- 1 ) = o(n" 2 (P +1 W( 2 P+ 3 )) = n -^P+^K^)0{l). 
Thus, E{r 2 (t)|P} = UP [/ l 2 (P+ 1 ) + (n i /i)' 1 ] =n- 2 (P+ 1 ) 5 /( 2 P+ 3 )0 UP (l). There- 
fore, fi(t) = fi(t) + n-( p+1 ) 5 /( 2p+3 )Oup(l)- Theorem 2 is then proved. □ 

Proof of Theorem 3. First of all, notice that fj(t) = n" 1 Ya=i fi(t) = 
f(t) + r(t). It follows that Bias(fj(t)\V) = E(f(t)\T>), Cov(fj(s),fj(t)\T>) = 
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Cov(/» J(t)) + Cov(/(a),f(t)) + Cov(f( a )J(t)) + Cov(F(fl),f(t)). The re- 
suits of Theorem 3 follow directly from Lemma 4. □ 

Proof of Theorem 4. Since fj(t) = f(t)+r(t) = fj(t)+f(t), in order to 
show the first expression in (2.11), it is sufficient to prove that E{f 2 (t)|P} = 
n -2(p+i)5/(2 P +3) 0up ( 1 ^ xhig result follows directly from E{r 2 (t)\V} = {E(r(t)\ 

T>)} 2 + Var(f (t)\T>) and Lemma 4. To show the second expression in (2.11), 
notice that the covariance estimator 7(s,i) can be expressed as 

1 n 

tM) = -EU*( S ) - /WH/i(t) - /(*)} 

1=1 

l n 



-/(•)}{»■*(*) - f (*)> 



n l= i 
i n 



+-E{ p <w- f WH/i(*) -/(*)} 



n . 

1=1 



I n 

+ -EM s )- f ( s )H r ^)- f ( t )} 

I I . _. 
2 = 1 

= %s,t) + h + I 2 + h, 

where Ti(t) = fi(t) — fi(t), i = 1, 2, . . . ,n, are independent and asymptotically 
have the same variance. By the law of large numbers and by Lemma 3, we 
have 

h = Ejn- 1 f>[(/,(*) - /(*))(r,(f) - r(*))P)]}op(l) 

= E{Cov(/ 1 ( S ),r 1 (t)|P)}0 P (l) 
= n -( P+ i )(5 /(2 P+ 3) 0up(1) _ 

Similarly, we can show that 

I 2 = n-^ S /^0 VP (1) and I 3 = n - 2 ^ +l ^ 2p+ ^0 VP (l). 

The second expression in (2.11) then follows. When 5 > 1 + l/[2(p + 1)], we 
have 

n 1/2 {v(t) - fj(t)} = oup(1), n 1/2 {7(M) " 7M)} = °up(1)- 
By the definition of ?j(i) and 7(5, i), we have 

n 

fj(t) = 7](t) + v(t), j(s, t) = n- 1 E Vi(s)vi(t) - v(s)v(t). 

i=i 
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By the law of large numbers and the central limit theorem, it is easy to show 
that 

n l ' 2 {f,{t) -q{t)} ~ AGP(0 )7 ), nWtfM-yfat)}- AGP(0, 7 *), 
where 

7*{(si,ti), (s 2 ,*2)} = Cov{ui(si)ui(ti),u 1 (s 2 )fi(<2)} 

= E{vi(si)vi(ti)ui(s 2 )ui(t 2 )} - 7(si,ti)7(s 2) t 2 ). 

In particular, when v(t) is a Gaussian process, we have 

E{v 1 (s 1 )v 1 (t 1 )v 1 (s 2 )v 1 (t 2 )} 

= l(si, hh{s 2 , t 2 ) + 7(si, t 2 )7(s 2 , *i) + 7(«i> s 2 )7(fi,t 2 ). 

Thus, 7 *{(si,ti), (s 2 ,t 2 )} =7(si,t 2 )7(s 2 ,ti)+7(si,s 2 )7(ti,t 2 ). The proof of 
Theorem 4 is finished. □ 

PROOF of Theorem 5. Under Condition A and by Theorem 2, we have 
fife) = /i(^) + n_(p+1)5/(2p+3)o Up(l)- It follows that 

2 " {ViJ ~ kUj)} 2 = {en + n-^W^+ 3 )0 UP (l))} 2 



= 4 + 2n^ +1 W^ + %0 UP (l) + n-^+W^+^Oupa). 

Plugging this into (2.14) with b = 0(iV~ 1 / 5 ), we have a 2 (t) =I l + I 2 + I 3 , 
where under the given conditions and by standard kernel estimation theory, 

T _ ^J=l ~ 0g| _ i Ar-2/5^, 

Ei=l l^j=l H b{tij - 1) 



/ 3 



n -( P+ l)5/(2 P+ 3) 0up(1)) 

E"=i E"=i - t)n- 2 (p+ 1 )- 5 /( 2 f+ 3 )o U p(i) 



E?=iE-=i^(%-0 

= n -2( P+ l)«5/(2 P+ 3) 0up(1) _ 

Under Condition A4, > Cra 5 . This implies that N = J2i=i n i > Cn 1+S . 
Thus, iV- 2 / 5 = O ( n - 2 ( 1+<5 )/ 5 ) . It follows that a 2 (t) =a 2 {t) + V p (n" 2 ( 1+5 )/ 5 + 
n —(p+i)S/(2p+3)^ ag jggiped, The proof of the theorem is completed. □ 

PROOF of Theorem 6. Under the conditions of Theorem 2, we have 
\n(t)\ = \fi{t) - fi(t)\ < n -{p+W{2p+z) c for some q > o for all i and t Let 
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A(t) = [Ai(t), . . . , A q (t)f = (X T X)- 1 X T (f (t) -f (t)). Then for r = 1, 2, . . . , q, 



It follows that A(t) =n-(P +1 ) <5 /( 2p+3 )0 U p(l). The first expression in (3.4) 
follows directly from the fact j3(t) — (3(t) = A(i). 

To show the second expression in (3.4), notice that Vi(t) = Vi(t) + ri{t) + 
■x[[j3(t)-j3(t)] = Vi(t) +n-(P +1 ) <5 /( 2p+3 )0 UP (l) because under the given con- 
ditions, we have x; = O up (1), r*(t) = n-(P +1 ) 5 /( 2 P +3 )0 UP (l), and /3(i) - 
(3(t) = n-( p+1 ) 5 /( 2p+3 )Oup(l)- Further, by Condition B, we have Vi[t) = 
Oup(l), therefore, Vi{s)vi{t) = Vi{s)vi(t) + n -^ +l ^^ 2p+ ^O v -p{l). The sec- 
ond expression in (3.4) follows immediately. 

When 5 > 1 + l/[2(p + 1)], we have (p + 1)5/ (2p + 3) > 1/2. Therefore, 
Vn[0(t) ~P(t)] = n 1 / 2 -(P+ 1 )V(2p+3) c , Up ( 1 ) = 0up (i). Moreover, it is easy to 
show that 

(6.5) vH3(i)-/3(i)]~AGP(0, 7/3 ), 

where jg(s,t) = r f(s,t)ft~ 1 . The result in (3.5) follows immediately. The 
proof of the theorem is completed. □ 

Proof of Theorem 7. Recall that w(i) = [C(X T X)- 1 C T ]- 1 / 2 [C ; 3(t)- 
c(t)], as defined in (3.8). Define w(t) similarly by replacing /3(t) with /3(t). 
Then by (3.11), we have T n = J a ||w(i)|| 2 dt and similarly, T n = / Q ||w(i)|| 2 dt. 

Let A(i) = w(t) - w(t) = [CCX^-^^-Vac^^) - /3(t)]. Then un- 
der the given conditions and by Theorem 6, we can show that A(t) = 



n i/2-(p+i)«/(2p+3) x0up ( 1 ). it follows that w(t) = w(i) W /2 - (p+1)5/(2p+3) UP (l) 
and, hence, T n = T n + 2/ a 6 w(i) T A(t) dt + fi || A(t)|| 2 dt = T n W /2 - (p+1)<5/(2p+3) O p (l), 



as desired. 

When 5 > 1 + l/[2(p + 1)], we have T n = T n + op(l) as n -> oo. Thus, to 

show (3.14), it is sufficient to show T n = J2"=i -V-Aj. + op(l). Using (6.5) in 
the proof of Theorem 6 above, it is easy to show that w(t) ~ AGP(?7 W ,7 W ), 



where rj w (t) = ^Jn{C£l~ 1 C T )- l / 2 [C(3{t) - c{t)\ and ~t w {s,t) =j(s,t)I k , as 



defined in (3.9). It follows that the k components of w(t) are independent 
of each other, and the Zth component wi(t) ~ AGP (rj w i, 7), where rj w i{t) is 



we have 





< Cn -( P +lW(2 P +3) E | e T^l Xi|[1 + 0p(1)] _ 
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the Ith component of i) w {t) as denned in (3.9). Since 7(s,i) has the singular 
value decomposition (3.12), we have wi(t) = Y^Li €lr<f>r (*) > where 

(6.6) / wi (t)(j) r (t) dt ~ AN(//j r , A r ) , 

with //; r = r) w i(t)(j) r (t) dt. It follows that 

r„= A|w(*)ll 2 ^ = E /%, 2 (i)di 

km m k 

Z=lr=l r=lZ=l 

because the eigenfunctions <f> r {t) are orthonormal over T = [a,b] and the 
summation is exchangeable due to the nonnegativity of £ 2 r . By (6.6), we have 

Eti & = KA, where A r ~ X 2 (u 2 ) with u 2 = A" 1 Eti A& = V 1 II la V w (t) x 
<^ r (t) cfa|| 2 , as given in (3.15). It follows that T n = X^Li \A r + op(l), as de- 
sired. The proof of the theorem is completed. □ 
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